“In the name of God, I, take you, to be my husband/wife, to have and to hold from this day forward, for better or worse, for richer or poorer, in sickness and in health, to love and to cherish all the days of my life. This is my solemn vow.”




So you still believe in fairy tale?

Time to wake up.

1 Introduction

In this project, we seek to find out features that may affect people’s marriage status. Are they truly loyal to their wedding vows? Or are there certain patterns or factors that may lead their marriage to end?

We choose ‘marital status’ and ‘number of times married’ as measurement of a person’s performance on marriage. By relating this to other factors, such as income, eduction level, or health status, we wish to find some patterns upon them.

We may find answers for some interesting questions, for example, does high income help people keep marriage steady? Should we listen to parents’ suggestions to break up with boyfriend who has a severe disease or disability? Is it a good idea to get married with a person who has a doctor degree?

Our goal is to find related factors that may bring negative effects to marriage, and avoid them.

2 Data Selection

2.1 Marriage Performance

We tag marriage performance by two measurement: ‘Marital Status’ and ‘Number of Times Married’.

  1. fac = 1 (Good Marriage Performance): the number of times married is one and the marital status is ‘not divorced’ now, we think this person does very well in marriage.

  2. fac = 2 (Poor Marriage Performance): the number of times married is one and the marital status is ‘divorced’ now, or the number of times is two and the marital status is ‘not divorced’ now, we think this person performs not bad in marriage.

  3. fac = 3 (Inferior Marriage Performance): the number of times married is three, or this number is two and the marital status is ‘divorced’ now, we think this person does very poor in marriage.

databad <- data_ori[(MAR == 3 | MARHT >= 2)]
datagood <- data_ori[(MAR != 3 & MARHT < 2)]

#Define a good marriage attitude
datagood[, fac := 1, ]

#Define a poor marriage attitude
databad1 <- databad[!(MARHT >= 3 | (MARHT == 2 & MAR == 3))]
databad1[, fac := 2, ]

#Define a inferior marriage attitude
databad2 <- databad[MARHT >= 3 | (MARHT == 2 & MAR == 3)]
databad2[, fac := 3, ]

data_f <- rbind(datagood, databad1, databad2)

2.2 Disability Record

  1. DIS = 1: with a disability
  2. DIS = 2: without a disability
data_dis <- data_final[!is.na(DIS), list(DIS, fac)]

2.3 Race Code

  1. RAC1P = 1: White alone
  2. RAC1P = 2: Black or African American alone
  3. PAC1P = 3: American Indian alone
  4. PAC1P = 4: Asian alone
data_RAC1P <- data_final[RAC1P==1|RAC1P==2|RAC1P==3|RAC1P==6, list(RAC1P, fac)]

2.4 Educational Attainment

  1. SCHL = 1: No schooling completed
  2. SCHL = 16: Regular high school diploma
  3. SCHL = 21: Bachelor’s degree
  4. SCHL = 22: Master’s degree
  5. SCHL = 24: Doctorate degree
data_SCHL <- data_final[SCHL==1|SCHL==16|SCHL==21|SCHL==22|SCHL==24, list(SCHL, fac)]

2.5 Wages or Salary Income Past 12 Months

data_WAGP <- data_final[!is.na(WAGP), list(WAGP, fac)]

2.6 Veteran Period of Service

  1. VPS = 1: Gulf War: 9/2001 or later
  2. VPS = 6: Vietnam Era
  3. VPS = 11: WWII
data_VPS <- data_final[VPS==1|VPS==6|VPS==11, list(VPS, fac)]

2.7 Military Service

  1. MIL = 1: Now on active duty
  2. MIL = 2: On active duty in the past, but not now
  3. MIL = 3: Only on active duty for training in Reserves/National Guard
  4. MIL = 4: Never served in the military
data_MIL <- data_final[!is.na(MIL), list(MIL, fac)]

3 Weight Incorporation

We removed all NA and balanced weight for the sample data.

data_all_var <- data_final[!is.na(DIS) & (RAC1P==1|RAC1P==2|RAC1P==3|RAC1P==6) & (SCHL==1|SCHL==16|SCHL==21|SCHL==22|SCHL==24) & !is.na(WAGP) & !is.na(MIL), list(DIS, RAC1P, SCHL, WAGP, MIL, PWGTP, fac)]
#load disability file
allvar <- fread('updated_data_all_var.csv') 
allvar <- data.frame(allvar)

#clustering based on wage
allvar$WAGP <- as.array(kmeans(allvar$WAGP, 10, nstart = 20)$cluster)
#head(allvar)

# 1.DIC 2. RAC1P  3.SCHL 4.WAGE 5.MIL 6.PWGTP 7.fac

#######
#1.wage -> dis
f.dis <- function(col1){ 
  a <- sort(unique(allvar[,col1]))
  b <- sort(unique(allvar$DIS))
  results <- matrix(NA, nrow = length(a), ncol = length(b) )
  cou = 1
  for (i in 1:length(a)){
    flow <- count(allvar[which(allvar[,col1]==a[i]),],DIS, wt = PWGTP)
    colnames(results) <- sort(b)
    rownames(results) <- sort(a)
    results[cou,] <-flow$n
    cou <- cou + 1
  }
  return(results)
}

# Wage to fac
f.dis(4)

#######
#2.dis -> edu
unique(allvar$DIS)
f.edu <- function(col1){ 
  a <- sort(unique(allvar[,col1]))
  b <- sort(unique(allvar$SCHL))
  results <- matrix(NA, nrow = length(a), ncol = length(b) )
  cou = 1
  for (i in 1:length(a)){
    flow <- count(allvar[which(allvar[,col1]==a[i]),],SCHL, wt = PWGTP)
    colnames(results) <- sort(b)
    rownames(results) <- sort(a)
    results[cou,] <-flow$n
    cou <- cou + 1
  }
  return(results)
}

# Dic to edu
f.edu(1)

#######
#3. edu -> race
unique(allvar$SCHL)
unique(allvar$RAC1P)
f.rac <- function(col1){ 
  a <- sort(unique(allvar[,col1]))
  b <- sort(unique(allvar$RAC1P))
  results <- matrix(NA, nrow = length(a), ncol = length(b) )
  cou = 1
  for (i in 1:length(a)){
    flow <- count(allvar[which(allvar[,col1]==a[i]),],RAC1P, wt = PWGTP)
    colnames(results) <- sort(b)
    rownames(results) <- sort(a)
    results[cou,] <-flow$n
    cou <- cou + 1
  }
  return(results)
}

# Edu to race
f.rac(3)

#######
#4. race -> millitary
unique(allvar$RAC1P)
unique(allvar$MIL)
f.mil <- function(col1){ 
  a <- sort(unique(allvar[,col1]))
  b <- sort(unique(allvar$MIL))
  results <- matrix(NA, nrow = length(a), ncol = length(b) )
  cou = 1
  for (i in 1:length(a)){
    flow <- count(allvar[which(allvar[,col1]==a[i]),],MIL, wt = PWGTP)
    colnames(results) <- sort(b)
    rownames(results) <- sort(a)
    results[cou,] <-flow$n
    cou <- cou + 1
  }
  return(results)
}

# race to millitary
f.mil(2)

########################################
# each variable numbers flow to fac
f <- function(col1){ #col1 ->col2
  a <- sort(unique(allvar[,col1]))
  b <- sort(unique(allvar$fac))
  results <- matrix(NA, nrow = length(a), ncol = length(b) )
  cou = 1
  for (i in 1:length(a)){
    flow <- count(allvar[which(allvar[,col1]==a[i]),],fac, wt = PWGTP)
    colnames(results) <- sort(b)
    rownames(results) <- sort(a)
    results[cou,] <-flow$n
    cou <- cou + 1
  }
  return(results)
}

4 Data Visualization

## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:graphics':
## 
##     layout
## Loading required package: rCharts
## Loading required package: plyr

4.1 ‘For Richer or Poorer’ — Wages VS Marriage Performance

a. Boxplot

#plot
##wages: 
# 1.box-plots for wage
p <- plot_ly(ggplot2::diamonds, y = data_raw$WAGP  , color = data_raw$fac , type = "box")
p

b. Bar Chart

# 1. bar chart for disability
p <- plot_ly(
  x = c('with disability', 'without disability'),
  y = c(7924298,50876772),
  name = 'good marriage performance',
  type = 'bar'
)
p2 <- add_trace(
  p,
  x = c('with disability', 'without disability'),
  y = c(3865462,18513842),
  name = 'poor marriage performance',
  type = 'bar'
)
p3 <- add_trace(
  p2,
  x = c('with disability', 'without disability'),
  y = c(1568850, 5045137),
  name = 'inferior marriage performance',
  type = 'bar'
)
p4 <- layout(p3, barmode = "stack")
p4
# 2. bar chart for wage
p <- plot_ly(
  x = c("A","B","C","D","E","F","G","H","I","J"),
  y = c(377034, 25113657,   5645546,    7056107,    944659, 383814, 4157641,    2427844,    6522414,    6172354),
  name = "good marriage performance",
  type = "bar")
p2 <- add_trace(
  p,
  x = c("A","B","C","D","E","F","G","H","I","J"),
  y = c(114668, 9748396,    2379307,    2767572,    257943, 95312,  1315064,    696605, 2822095,    2182342),
  name = "poor marriage performance",
  type = "bar")
p3 <- add_trace(
  p2,
  x = c("A","B","C","D","E","F","G","H","I","J"),
  y = c(30508,  3396854,    696597, 708733, 46991,  18961,  278326, 161293, 758216, 517508),
  name = "inferior marriage performance",
  type = "bar")
p4 <- layout(p3, barmode = "stack")
p4

4.2 ‘No Matter Who You Are’ — Races VS Marriage Performance

a. Bar Chart

# 3.bar chart for race
library(plotly)
p <- plot_ly(
  x = c("white", "black/africa Am.", "Am. Indian", "Asian"),
  y = c(48622273,4629716,204187,5344894),
  name = "good marriage performance",
  type = "bar")
p2 <- add_trace(
  p,
  x = c("white", "black/africa Am.", "Am. Indian", "Asian"),
  y = c(430131,11367450,6906974,3135660,539089),
  name = "poor marriage performance",
  type = "bar")
p3 <- add_trace(
  p2,
  x = c("white", "black/africa Am.", "Am. Indian", "Asian"),
  y = c(148481,3797101,1722793,797134,148478),
  name = "inferior marriage performance",
  type = "bar")
p4 <- layout(p3, barmode = "stack")
p4

b. Dependency Plot

4.3 ‘No Matter How You Are’ — Education Level VS Marriage Performance

a. Bar Chart

##4. barchart for education
library(plotly)
p <- plot_ly(
  x = c("No completed school", "high school", "bachelor", "master", "doctorate"),
  y = c(1372324,23810758,2180033,1725501),
  name = "good marriage performance",
  type = "bar")
p2 <- add_trace(
  p,
  x = c("No completed school", "high school", "bachelor", "master", "doctorate"),
  y = c(430131,11367450,6906974,3135660,539089),
  name = "poor marriage performance",
  type = "bar")
p3 <- add_trace(
  p2,
  x = c("No completed school", "high school", "bachelor", "master", "doctorate"),
  y = c(148481,3797101,1722793,148478),
  name = "inferior marriage performance",
  type = "bar")
p4 <- layout(p3, barmode = "stack")
p4

b. Dependency Plot

4.4 ‘In Sickness and in Health’ — Military Status VS Marriage Performance

a. Bar Chart

##5. barchart for military
p <- plot_ly(
  x = c("No duty", "Duty in pust", "Only duty for training", "Never served"),
  y = c(202241,4757340,888360,52953129),
  name = "good marriage performance",
  type = "bar")
p2 <- add_trace(
  p,
  x = c("No duty", "Duty in pust", "Only duty for training", "Never served"),
  y = c(50390,2541718,409714,19377482),
  name = "poor marriage performance",
  type = "bar")
p3 <- add_trace(
  p2,
  x = c("No duty", "Duty in pust", "Only duty for training", "Never served"),
  y = c(7198,1047573,147901,5411315),
  name = "inferior marriage performance",
  type = "bar")
p4 <- layout(p3, barmode = "stack")
p4

b. Dependency Plot



4.5 Parallel Coordinates Plots

Parallel coordinates is a visualization technique used to plot individual data elements across many dimensions. Each of the dimensions corresponds to a vertical axis and each data element is displayed as a series of connected points along the dimensions/axes. This technique can have an explicit explanation of the corresponding categorical parameters.

## Parallel coordinates plots

subdata = as.data.frame(subdata)
subdata_new = subdata[,c(1,2,3,4,5,7)]
scpcp(subdata_new, sel="data[,6]", sel.palette = "w")

4.6 Interactive Parallel Coordinates Plot

ALghouth, the previous static plot shows the relationships among those categorical variabales, this interactive parallel coordinates plot presents those relationships more explicitly.

(Click me!)

5 Conclusion

6 Next Step